{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TiDB Property Graph Index\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/property_graph/property_graph_tidb.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "\n",
    "TiDB is a distributed SQL database, it is MySQL compatible and features horizontal scalability, strong consistency, and high availability. Currently it only supports Vector Search in [TiDB Cloud Serverless](https://tidb.cloud/ai).\n",
    "\n",
    "In this nodebook, we will cover how to connect to a TiDB Serverless cluster and create a property graph index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index llama-index-graph-stores-tidb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare TiDB Serverless Cluster\n",
    "\n",
    "Sign up for [TiDB Cloud](https://tidb.cloud/) and create a TiDB Serverless cluster with Vector Search enabled.\n",
    "\n",
    "Get the db connection string from the Cluster Details page, for example:\n",
    "\n",
    "```\n",
    "mysql+pymysql://user:password@host:4000/dbname?ssl_verify_cert=true&ssl_verify_identity=true\n",
    "```\n",
    "\n",
    "TiDB Serverless requires TSL connection when using public endpoint."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Env Setup\n",
    "\n",
    "We need just a few environment setups to get started."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-...\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Index Construction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9a7bffce7add4c59ae0b7daa6945cfae",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Extracting paths from text with schema: 100%|██████████| 22/22 [00:44<00:00,  2.02s/it]\n",
      "Generating embeddings: 100%|██████████| 1/1 [00:01<00:00,  1.66s/it]\n",
      "Generating embeddings: 100%|██████████| 3/3 [00:01<00:00,  1.51it/s]\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core import PropertyGraphIndex\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core.indices.property_graph import SchemaLLMPathExtractor\n",
    "from llama_index.graph_stores.tidb import TiDBPropertyGraphStore\n",
    "\n",
    "graph_store = TiDBPropertyGraphStore(\n",
    "    db_connection_string=\"mysql+pymysql://user:password@host:4000/dbname?ssl_verify_cert=true&ssl_verify_identity=true\",\n",
    "    drop_existing_table=True,\n",
    ")\n",
    "\n",
    "# Note: it can take a while to index the documents, especially if you have a large number of documents.\n",
    "# Especially if you are connecting TiDB Serverless to a public endpoint, it depends on the distance between your server location and the TiDB serverless location.\n",
    "index = PropertyGraphIndex.from_documents(\n",
    "    documents,\n",
    "    embed_model=OpenAIEmbedding(model_name=\"text-embedding-3-small\"),\n",
    "    kg_extractors=[\n",
    "        SchemaLLMPathExtractor(\n",
    "            llm=OpenAI(model=\"gpt-3.5-turbo\", temperature=0.0)\n",
    "        )\n",
    "    ],\n",
    "    property_graph_store=graph_store,\n",
    "    show_progress=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying and Retrieval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Interleaf -> USED_FOR -> software for creating documents\n",
      "Interleaf -> HAS -> scripting language\n",
      "Interleaf -> HAS -> Lisp\n",
      "Viaweb -> USED_FOR -> site builders\n",
      "Viaweb -> USED_FOR -> ecommerce software\n",
      "Viaweb -> USED_FOR -> retail\n",
      "Viaweb -> USED_FOR -> business\n",
      "Viaweb -> IS_A -> application service provider\n",
      "Viaweb -> IS_A -> software as a service\n"
     ]
    }
   ],
   "source": [
    "retriever = index.as_retriever(\n",
    "    include_text=False,  # include source text in returned nodes, default True\n",
    ")\n",
    "\n",
    "nodes = retriever.retrieve(\"What happened at Interleaf and Viaweb?\")\n",
    "\n",
    "for node in nodes:\n",
    "    print(node.text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Interleaf added a scripting language inspired by Emacs, which was a dialect of Lisp. The individual who worked at Interleaf found the Lisp implementation challenging due to their lack of knowledge in C. They also learned various lessons about technology companies and office dynamics during their time at Interleaf. On the other hand, Viaweb was used for site builders, ecommerce software, retail, and business purposes. The work on Viaweb and Y Combinator initially seemed unimpressive and lacked prestige, but the individual found success by working on less prestigious projects.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(include_text=True)\n",
    "\n",
    "response = query_engine.query(\"What happened at Interleaf and Viaweb?\")\n",
    "\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading from an existing Graph\n",
    "\n",
    "If you have an existing graph (either created with LlamaIndex or otherwise), we can connect to and use it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import PropertyGraphIndex\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core.indices.property_graph import SchemaLLMPathExtractor\n",
    "from llama_index.graph_stores.tidb import TiDBPropertyGraphStore\n",
    "\n",
    "graph_store = TiDBPropertyGraphStore(\n",
    "    db_connection_string=\"mysql+pymysql://user:password@host:4000/dbname?ssl_verify_cert=true&ssl_verify_identity=true\",\n",
    ")\n",
    "\n",
    "index = PropertyGraphIndex.from_existing(\n",
    "    property_graph_store=graph_store,\n",
    "    llm=OpenAI(model=\"gpt-3.5-turbo\", temperature=0.3),\n",
    "    embed_model=OpenAIEmbedding(model_name=\"text-embedding-3-small\"),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From here, we can still insert more documents!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Document\n",
    "\n",
    "document = Document(text=\"LlamaIndex is great!\")\n",
    "\n",
    "index.insert(document)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Llamaindex -> Is -> Great\n"
     ]
    }
   ],
   "source": [
    "nodes = index.as_retriever(include_text=False).retrieve(\"LlamaIndex\")\n",
    "\n",
    "print(nodes[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For full details on construction, retrieval, querying of a property graph, see the [full docs page](/../../module_guides/indexing/lpg_index_guide)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
